Backend Scaling & Performance – Structured Notes
Edge Computing
-
Edge computing refers to processing requests at CDN edge nodes instead of central servers
-
Traditional CDNs:
- Serve static content only (JS, CSS, images, videos)
- No computation or logic involved
-
Edge computing:
- Adds a processing layer at CDN nodes
- Executes logic before returning a response
Why Edge Computing is Faster
-
Edge nodes are:
- Closer to users (geographically)
- More numerous than data centers
-
Reduces round-trip latency
-
Even if computation time is same, network latency is lower
Example: Authentication
-
Traditional:
- User → Server → DB check → Response (~100 ms)
-
Edge:
- User → Edge → Validate session → Response (~2–3 ms)
-
Benefits:
- Faster rejection of unauthorized requests
- Reduces load on main servers
Use Cases
- Authentication validation
- User localization (language, region-based content)
- Request filtering and routing
- User preference customization
Infrastructure Context
-
CDN nodes are:
- Often deployed via ISPs
- More distributed than data centers
-
Data centers:
- Expensive, resource-heavy
-
Edge nodes:
- Lightweight, closer to users
Limitations
-
Limited resources:
- Low RAM (e.g., ~1 GB)
- Limited CPU
-
Runtime constraints:
- Cannot access file system
- Limited protocol support (e.g., TCP restrictions)
-
Example:
- Cloudflare Workers use V8 isolates (JavaScript runtime)
Conclusion
-
Edge computing is useful for:
- Low-latency, lightweight logic
-
Cannot replace primary servers
-
Best used alongside central infrastructure
Asynchronous Processing
- Used to reduce perceived latency
- Moves non-critical tasks out of request-response cycle
Synchronous Processing
-
Request → Processing → DB → External API → Response
-
Example:
- Invite user → DB update + Email → Response (~400 ms)
Asynchronous Processing
-
Request → DB update → Immediate Response (~100 ms)
-
Background:
- Task pushed to queue
- Worker processes it later
Key Components
-
Producer:
- Pushes jobs/tasks into queue
-
Queue:
- Redis, RabbitMQ, Kafka
-
Consumer (Worker):
- Picks and executes tasks
Example: Sending Email
-
Instead of waiting for email API:
- Push job to queue
- Respond immediately to user
-
Improves user experience significantly
Example: Video Processing
-
Upload completes
-
Background tasks:
- Thumbnail generation
- Encoding
- Subtitles
-
User does not wait for processing
Example: Delete Account
-
Synchronous:
- Delete millions of records → ~8 seconds delay
-
Asynchronous:
- Respond immediately
- Perform deletion in background
When to Use
-
Tasks not requiring immediate feedback:
- Emails
- Notifications
- Image/video processing
- Data cleanup
Tools
-
Redis queues
-
RabbitMQ
-
Kafka
-
Libraries:
- BullMQ (Node.js)
Microservices Architecture
Monolith
- Single codebase and deployable unit
- All modules (auth, payments, notifications) in one system
Advantages
- Easy to develop, test, deploy
- Simple debugging
- Centralized codebase
Disadvantages
- Hard to scale individual components
- Deployment dependencies between modules
Microservices
- Application split into independent services
Key Idea
- Focus is on scaling teams, not just systems
Advantages
-
Independent scaling of services
-
Independent deployments
-
Flexibility in tech stack:
-
Example:
- Node.js for APIs
- Go/Rust for performance-critical tasks
-
Problems with Monolith (Motivation)
- Deployment conflicts between teams
- Cannot scale specific modules independently
- Tech stack limitations
Disadvantages of Microservices
-
Network overhead:
- Function calls → Network calls
-
Failure handling complexity:
- Retries, timeouts
-
Debugging difficulty:
- Requires distributed tracing
-
Data consistency issues:
- Multiple databases → replication lag
When to Use Microservices
- Large teams (100+ developers)
- Independent scaling needs
- Different technology requirements
When Not to Use
- Small teams/startups
- Early-stage applications
Serverless Computing
Traditional Model
-
Run application on servers (VMs)
-
Responsibilities:
- Provisioning
- Configuration
- Maintenance
-
Cost:
- Pay 24/7 regardless of usage
Problems with Traditional Servers
-
Capacity planning:
- Hard to predict traffic
-
Underprovisioning:
- Crashes, slow responses
-
Overprovisioning:
- High cost
-
Autoscaling limitations:
- Slow scaling (seconds–minutes)
- Reactive, not proactive
- Cost unpredictability
Serverless Model
-
Developer provides:
- Code (functions)
- Event triggers
-
Cloud provider handles:
- Infrastructure
- Scaling
Architecture
- User → API Gateway → Function execution
Key Differences
- No server management
- Pay per execution (not uptime)
Benefits
- Cost-efficient
- Automatic scaling
- No infrastructure management
Problems
Cold Start
- Delay when spinning up new instance
Execution Limits
-
Example:
- AWS Lambda → ~15 minutes max
Statelessness
-
No persistent connections
-
No local state
-
Requires redesign of:
- DB connections
- WebSockets
Solutions to Cold Start
-
Keep instances warm (pinging)
-
Use lightweight runtimes:
- Firecracker (AWS)
- V8 isolates (Cloudflare)
Good Use Cases
- Event-driven systems
- Image/video processing
- Background jobs
Bad Use Cases
- Low-latency critical systems
- Long-running processes
- Stateful applications
Key Principles for Scaling & Performance
1. Measure First
-
Always identify bottlenecks before optimizing
-
Use:
- Logs
- Metrics
- Traces
-
Tools:
- Prometheus
- Grafana
- New Relic
2. Avoid Premature Optimization
-
Optimize only after identifying real issues
-
Example:
- Use indexing before adding caching
3. Prefer Simple Solutions
-
Simpler systems:
- Easier to debug
- Easier to maintain
-
Complexity adds:
- Failure points
- Operational overhead
4. Scale Based on Need
-
Do not design for massive scale initially
-
Build with:
- Current needs
- Some buffer
5. Observability is Critical
-
Implement from day one:
- Logging
- Metrics
- Tracing
-
Helps:
- Detect issues early
- Plan scaling
6. Performance is a Mindset
-
Continuous process:
- Build → Measure → Optimize
-
Focus:
- Diagnose problems quickly
- Handle failures gracefully
Final Insight
-
Scaling techniques are solutions, not defaults
-
Always:
- Understand the problem
- Measure system behavior
- Apply the right solution accordingly